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Abstract 

We present the first tests and results from a study of QCD with 
two flavours of dynamical Wilson fermions using the Hybrid Monte 
Carlo Algorithm (HMCA) on APEIOO machines. 

Initially we have tested the algorithm without fermions to pro- 
duce SU(3) pure gauge configurations. The simulations have been 
performed on a 6^ lattice at /3 = 5.7 and /3 = 6.0. 

Then we started with the full QCD simulations. The HMCA pa- 
rameters have been tuned using a wide class of tests performed on a 
8^ and on 12^ x 32 lattices. First results for the mesons correlations 
functions on 12^ x 32 lattice are obtained at /3 = 5.3. 

In this paper we are briefly facing some technical aspects: we dis- 
cuss about the inversion algorithm for the fermionic operator, we un- 
derline the methods used to overcome the problems arising using a 
32 bits machine and we discuss the implementation of a new random 
number generator for APEIOO machines. 

Finally we propose different scenarios for the simulation of phys- 
ical observables, with respect to the memory capacity and speed of 
different APEIOO configurations. 

ROME Prep. 93/972 
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1 Introduction. 



Numerical simulations of Quantum Field Theories on the lattice by means of 
Monte Carlo methods have become an important tool to understand and pre- 
dict non-perturbative phenomena in particle physics. The quantitative study 
of some aspects of quantum field theories, for example the determination of 
the hadronic mass spectrum of QCD, requires non-perturbative approach as 
such as the lattice one. 

Up to now most of the numerical simulations in lattice QCD have been 
performed in the quenched approximation and results on large lattices with 
high statistics have been obtained^. In this drastic and uncontrolled ap- 
proximation the QCD calculations are performed neglecting the contribution 
of the sea quarks. The only way to estimate the quenched systematic error is 
to repeat the calculations with dynamical sea quarks, but these simulations 
require one or two orders of magnitude more CPU time than the quenched 
ones. 

In the last years some groups have started the simulations of full QCD 
using the Hybrid Monte Carlo algorithm^ both for staggered and for Wilson 
dynamical fermions'S. In the case of the simulations of QCD with two de- 
generate flavours of Wilson fermions the mass spectrum and decay constants 
calculations^ are restricted to 16^ spatial lattice size, with (3 = 5.3 — 5.6 and 
up to (^)^ ~ 0.3. There are no results about the glueballs mass spectrum 
for Wilson dynamical fermions. 

In these full QCD calculations the main sources of errors have been found 
to be the correlations between different molecular dynamics trajectories00 
and finite size effects0. There are preliminary studies on the correlations for 
Wilson dynamical fermions, while the finite size effects have been studied, 
up to now, only for staggered fermions. 

In order to control all possible systematic and statistical errors in a real- 
istic full QCD simulation, a great amount of CPU time and a large memory 
capacity are required. These are the problems that caused the develop- 
ment of QCD-dedicated parallel supercomputers!!'. The APEIOO parallel 
machinesH promise to be a good tool to face the problem of unquenched 
QCD simulations. 

APEIOO is a synchronous SIMD machine, composed of a 3-dimensional 
cubic mesh of processors, with periodic boundary conditions. Each node 
contains a floating point processor (MAD) , a memory bank of IM 32-bit 
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words and a network interface device. The main advantage of this simple 
structure is modularity: the mesh can be enlarged to obtain more processing 
power. A scalar CPU takes care of the program flow and of all integer 
operations that essentially amount to address computation and do-loop book- 
keeping. The MAD processor operates on IEEE 32-bit floating point number 
(single precision). 

The results presented in this paper are obtained using the single board 
configuration (8 nodes disposed on a 2 x 2 x 2 cube), with a peak speed 
of 400 Mfiops and the "tube" configuration (128 nodes disposed on a 2 x 
2 X 32 lattice) with a peak speed of 6.4 Gfiops. The next available APEIOO 
configurations will be the 25 Gfiops "tower" (512 nodes in a 8 x 8 x 8 lattice) 
and the full 100 Gflop APEIOO with 2048 nodes arranged on a 16 x 16 x 8 
lattice. 

In this paper we present the implementation of the HMCA on the APEIOO 
machines, the tests performed and the QCD results obtained up to now. In 
section 2. we review the basic formalism of unquenched QCD, then in section 
3. we present the Hybrid Monte Carlo algorithm implemented on APEIOO 
and we underline the parameters' tuning required and the tests which can 
be performed. A short technical discussion about the single precision as- 
pects, the inversion algorithms and the random number generator used will 
be found in section 6. A detailed analysis of the technical aspects of the 
implementation of the HMCA on the APEIOO machines and of the perfor- 
mances obtained will be presented separatelyQ. 

In section 4. we present the tests and the results obtained using the 
HMCA in the quenched approximation on 6^^ lattices at /3 = 5.7 and (3 = 6.0, 
while in section 5. we present the results for the full QCD case. The latter 
have been obtained on 8^ and on 12^ x 32 lattices at f5 = 5.3, with values of 
Wilson hopping parameter ksea lower than that corresponding to the strange 
one. Finally in section 7. we report our conclusions and discuss the full 
QCD simulations that can be faced with the different APEIOO machines 
configurations. 



2 Unquenched QCD 

The greatest obstacle to the simulation of QCD on a lattice is the inclu- 
sion of dynamical fermions. Most of the simulations of lattice QCD have 
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ignored the contributions of sea quarks by using the quenched (or "valence" ) 
approximation. The QCD Lagrangian is 



Nf 

Lqcd = LYM + Y.^fiP + ^)^f ^ (2-1) 

/=1 

where Lym = —^F^'^"-F^'^"-, f is the flavour index and D is the usual covariant 
derivative. The Feynman path integral for the partition function Z is: 

Z = j VA^Vi)Vi)e-^'^^''^'^''° . (2.2) 

The integration over the fermionic degrees of freedom gives 

Z = j VA^[det{ p + m)]^/e-/'^'^^^^ . (2.3) 

In the quenched approximation the number of flavours, Nf is set to zero. 
Consequently, in terms of perturbative approach, in this approximation one 
does not consider sea quark contribution to physical observables, neglecting 
virtual quark loops and treating fermions as static degrees of freedom. 

It will be interesting to study the effects of the sea quarks on all the 
QCD observables, and particularly on that ones that are supposed to present 
strong disc rep ancies with the quenched results, such as the cr-term of the 
nucleon0'll3, or the the (vr — rj') mass splitting. Of course, if one could 
reach the physical iriT^/mp value, the most interesting observable would be 
the branching ratio of the p — *^ 27r decay. Up to now the numerical full QCD 
studies are performed at high quark masses, with statistic not comparable 
with the quenched one. 

In order to go beyond the quenched approximation, we need to calculate 
the non-local fermionic determinant. A standard way to compute a deter- 
minant requiresHS about ~ OiV) arithmetic operations so the use of 



standard Monte Carlo algorithms (like Metropolis, which requires the calcu- 
lations of the fermionic determinant for the updating of each link) is practi- 
cally impossible. 

Taking into account the effects of the sea quarks one can write the 
fermionic determinant by the introduction of a path integral over the pseu- 
dofermionic fields 0. Let call M = ( ^ + m) , 
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(2.5) 



= J VA^ e-/'^'"^^^ I D0D0*e-^*[^'^]"^* . (2.6) 
On the lattice the previous path integral becomes 

jvU V(PV(P* e-^G{u)-.p[MiM]-^r (2.7) 



where Sc{U) is the standard SU{3) Wilson action for the gauge fields, 

/5 



Sg{U) 



N, 



Y.tr{I - Re{Uu)) 



col □ 



(2.8) 



and the sum is over the plaquettes 

Uu^U^{x)U,{x + lJi)Ul{x + v)Ul{x) . 



(2.9) 



One observable which is generally calculated also in full QCD simulations 
is the plaquette ir(i?e(C/n)) that is related to the pure gauge energy as 
we can see from eq.(2.8). 

We have used the following formulation of the fermionic operator M, that 
appears in the lattice action (2.7): 



M{x,y) ^5^^y-kY, i'i-+ln)Ul{x)5^^y^f,+{l--f^)U^{x-n)5^^y+^ 



. (2.10) 



In the next section we discuss the algorithm that we use to generate 
full QCD configurations of link fields {U^{x)} distributed according to the 
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measure present in the path integral of eq.(2.7). Given a set of these {U^{x)} 
configurations one can perform a calculation of the quark propagator in order 
to study the hadronic mass spectrum and the meson decay constants. 

For example to perform a study of the meson properties we have to com- 
pute the following correlation functions: 

^55^ = E < P5{x,t)Pi{0,0) > (2.11) 

for pseudoscalar meson and 

Gkkit) = E E < Vk{x,t)vH0,0) > (2.12) 

fe=l,3 X 

for vector mesons. P5, Vk are the local pseudoscalar density and vector 
currents respectively: 

P^ix, t) = #(f , i)75V'(f , t) (2.13) 

T4(f,t)=V;(f,i)7feV(f,i) (2.14) 

where i/j is the spinor corresponding to the quark with Wilson hopping pa- 
rameter k, which we will call k^aience, which can be equal or different from the 
ksea value. Usually it is interesting to extract the hadronic mass spectrum 
for different value of kyaience, at fixed ksea in order to study the chiral limit. 

In order to extract the meson masses one can fit the two point correlation 
functions in the eqs.(2.11-2.12) to the expressions: 

Z- T T 

Gi{t) = -^e-^'^cosh{Mi{ t)), i = 55, kk (2.15) 

Mi 2 

where T is the lattice size in time. The data for the mesonic correlation 
functions can be fitted to the previous functions, obtaining Mj. The time 
interval which should be used in the fit can be chosen by checking that the 

effective mass me//(t) = log{G{t)/G{t + 1)) reaches a plateau value in the 
time interval considered which agrees with the result of the fit. Preliminary 
results for the mesons mass spectrum are reported in section 5. 
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In the next section we present the Hybrid Monte Carlo algorithm which 
is used to obtain full QCD configurations. 



3 The Hybrid Monte Carlo Algorithm. 

In this section we describe the HMCA that we are using for the full QCD 
simulation. The HMCA generates gauge configurations and pseudofermionic 
fields distributed according to the measure given in eq.(2.7). This is done 
at first suggesting a change to the whole lattice using the $ algorithm^, 
and then globally accepting or rejecting this change in the Metropolis step. 
These two steps are cyclically repeated. 

In the first step, in order to obtain the proposal for the new configuration 
in the $ algorithm, one introduces an Hamiltonian 7i(P, U) which depends 
on the link variables P^{x) and U^{x): 



U^{x) are the standard gauge link variables, while P^{x) play the role of 
Hamiltonian conjugate momenta. The sum of the second and the third term 
in equation (3.1) is exactly the action that appears in the exponential of 
eq.(2.7). The physical observables are not affected by the new term depend- 
ing on P, because it can be factored out becoming gaussian integral that 
only affects the normalization. 

The Hamiltonian variables are initialized in the following way: to gen- 
erate the pseudofermionic fields we extract random complex numbers r] 
distributed as P(?7) oc e'^~^^^\ In this way = M'^t] is distributed accord- 
ing to the pseudofermionic measure of the partition function (2.7). The 
fields have no dynamic and thus there are no 0-conjugate momenta into the 
Hamiltonian. 

In order to generate the P^{x) variables we extract real random numbers 
5f^(a;) which are drawn with probability distribution P{g) oc e*^~^ and then 
we construct: 



H(P, U) = -tr{P^) + Sg{U) + 0t(MtM)-V 



(3.1) 
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P,{x) = Y.9l{x)\ 



a 



(3.2) 



a=l 
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where A° are the Gell-Mann matrices. We want to note that both P,_i{x) and 
variables are generated randomly to ensure ergodicity. 

The (P, U) configuration is evolved according to the Hamiltonian equa- 
tions of motion, which in the QCD case can be obtained in the following 
way. In order for the matrix U^{x) to remain an element of S'f/(3), if P^{x) 
momenta are traceless and hermitian operators, the equation of motion for 
U^{x) must be 

U^{x)=iP^{x)U^{x) . (3.3) 
To obtain the evolution equation for the P^{x), we require the Hamiltonian 
to be a constant of the motionB, Tif = 0. In this way we find 



(3.4) 



To obtain a "trajectory" we have to refresh the P^{x) and the 0(x) vari- 
ables and then to evolve the ([/, P) configuration for a fixed time r. In the 
numerical integration of the Hamiltonian equations we introduce a finite time 
step dt. The number of molecular dynamics steps is Nmd, and the trajectory 
length is r = (it X Nmd- 

We have initially tested the implementation of this algorithm in the ab- 
sence of fermionic fields. Under these conditions the algorithm is used to 
produce quenched pure gauge configurations. These configurations can be 
statistically compared with those ones obtained with well tested existing 
Metropolis program. 

Defining the staples 

M^) = T.{uAx + f^^t)ul{x + iy,t)ul{x,t) 

V 

+Ul{x + /i - z/, t)Ul{x - z/, t)U^{x - u, t)) , (3.5) 
the discrete evolution equations for the "quenched" case can be written as: 



U^{x, t + dt) = e^^''(^'*)'^*f/^(x, t) (3.6) 



P^(x, t + dt)= P^(x, t) + fdtf [U^{x, t)V^{x)] , (3.7) 
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where T is the traceless antihermitian projector: 

f{A) = {A- A^) - Tr{A - A^) . 

The discrete evolutions equations used in the full case are 

Uf,{x, t + dt) = e*^'*(^'*)'^*t/^(x, t) 
P^{x,t + dt) = P^{x,t) 



+ idtT 



^U^{x,t)Vf,{x) - ksea ■ trDirac{U^{x,t)F^{x)) 



(3.8) 



(3.9) 



(3.10) 



where in the third term in eq.(2.21) the trace is over the Dirac indices and 



F^{x) = (1 + 7^)r(a; + ^l)x\x) + (1 - + ^l)Y\x) (3.11) 



with 



Y{x)= M[M^My^(t) 



(3.12) 
(3.13) 



In the previous equations we used an expansion of the exponential up to 
fourth order, which guarantees the time reversibility^. At each step of the 
molecular dynamics evolution we normalized the S'[/(3) link matrices [/^. 

In the time integration of the Hamiltonian equation we use a leap-frog 
algorithm which is time reversible and area preserving in the phase 

space. The reversibility assures the validity of the principle of detailed bal- 
ance. The leap-frog algorithm disentangles the update of U and P variables, 
that are no more evaluated at the same time: we perform an initial halfstep of 
size dt/2, and then a chain of Nmd — 1 steps of size dt, alternatively in U and 
P. Eventually, a last halfstep for the P momenta produces the U{t),P{t) 

^ We have also tried sixth order in the exponential expansion, without any significative 
difference in reversibility. 
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configuration. The deterministically proposed configuration over the whole 
lattice ([/', P') is then globally accepted or rejected with a probability 

^ = min(l,e-(^(^''^')-^™)) . (3.14) 

If we consider the usual Metropolis algorithm applied to full QCD we see 
that in a single sweep we need to make as many inversions of the fermionic 
operator as the number of the links we want to update, (0(V)), while with a 
trajectory of the HMCA we obtain the probability distribution (2.7) with only 
0{1) inversions. The main difference between the two algorithms is that in 
the Metropolis case the links of the new configuration are chosen randomly 
and the accept/reject condition is imposed link by link (locally), while in 
the HMCA the new configuration is obtained by the previous deterministic 
equations (3.3,3.4) and the accept/reject condition is imposed over the whole 
configuration (globally). 

A useful test for the implementation of the HMCA is the identity 0: 

< e-^^ >= 1 . (3.15) 



where 5H = H{P', U') - H{P, U) = H' - H. This identity follows from 
the validity of the Liouville theorem of the Hamiltonian evolution VUT>P = 
VUVP and becomes exact when the system has reached the equilibrium: 



- = - [ VU'VP'e-'^' = - [ VUVPe-V-'^' . (3.16) 
Z Z J Z J 



The (|3.16|) is satisfied for every choice of the algorithm parameters and 
permits a fine control on the proposed configuration. This identity is not 
satisfied if the iterative inversion algorithm for the pseudofermionic operator 
M^M is not successfully completed, with a strong enough degree of conver- 
gence. Then we can check the validity of eq.(3.15) to test the thermalization 
of our configuration and to fix the value of the residue of the inversion. In 
the previous identity the average is taken over all the trajectories (accepted 
or not accepted). A consequence of equation (3.15) is that we expect® that 

<n' -n>> . (3.17) 
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We use periodic boundary condition on the gauge fields. On the fermionic 
field we use periodic boundary condition in the time direction and periodic 
or antiperiodic spatial boundary condition. 

An important feature of the HMCA is the parameter's tuning. There 
are many parameters that have to be tuned: the free technical parameter 
are Nmd, the number of steps of a molecular dynamics (MD) trajectory 
and dt, the length of each step. The free physical ones are P = ^ and 
the hopping parameter kg^a into the fermionic operator M related to the sea 
quark mass. The physical parameters appear both in the evolution equations 
(3.9-3.10) and in the accept/reject Hamiltonian of eq.(3.14), so a further 
possible tuning, which has not been explored in the present paper, is the 
usage in the evolution equations of a Pmd and a /cmd different from the 
ones used in the accept /reject Hamiltonian. 

In our simulations we have fixed Pmd = P, ^md = kgea and the value 
of r = (it X Nmd, and we have studied the behaviour of the acceptance as 
a function of dt. We check that in an exact algorithm like the HMCA the 
value of the plaquette is independent from dt. 

In the full HMCA evolution equations (3.9,3.10), we need the quantities 
reported in eqs.(3.12,3.13). To obtain x{^) and then Y{x) we have to perform 
the inversion of the fermionic operator M^M. 



This is the most time-consuming part of the algorithm and must be re- 
peated Nmd times for each trajectory. To solve eq.(3.18) we use the Conju- 
gate Gradient algorithm described in section 6. We stop the iterative algo- 
rithm when 



(3.18) 



((MtMx-0),(MtMx-0)) 

{x,x) 



< R 



(3.19) 



is lower than a fixed R valuelll. 
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4 Quenched Results 



In this section we summarize our results obtained using both the Hybrid 
Monte Carlo and Hybrid Molecular Dynamics algorithm^ in the pure gauge 
case. We call Hybrid Molecular Dynamics algorithm an HMCA without the 
acceptance step. While the HMCA is an exact algorithm, the observables 
measured on configurations produced with HMD depend on the step size dt 
of the trajectory. 

In order to test our HMCA we performed numerical simulations with only 
the pure gauge part of the action. This means that the proposed configu- 
ration is obtained using the Hamiltonian equations without pseudofermions 
degrees of freedom, see eqs.(3.6-3.7). 

The simplest test we performed is the following: we check that the HMC 
pure gauge algorithm reproduces the same value of the plaquette as the 
Metropolis algorithm in a quenched simulation. Then we analyze the be- 
haviour of the acceptance as a function of the step size dt and finally we 
verify that the identity (3.15) is satisfied for all our HMC simulations. 

We perform numerical simulation on a 6^ lattice at two different value of 
l3, P = 5.7 and P = 6.0. For each value of (3 we studied the HMCA behaviour 
changing the free parameter dt. We choose the number of steps Nmd in order 
to maintain a fixed trajectory length, t = dt x Nmd ~ 0.5. The results for 
the acceptance rates, for the plaquette values and for < e~^^ > are reported 
in Table 1. 

We remark that the value of the plaquette is compatible with the quen- 
ched Metropolis result: for example, on a 6^ lattice, from 4000 sweeps of the 
Metropolis algorithm after 1000 sweeps of thermalization we obtain P(/3 = 
6.0) = 0.5946(4), in agreement with the HMCA pure gauge result P{(3 = 
6.0) = 0.5951(2). Analogously at /3 = 5.7 the Metropolis data is P{/3 = 
5.7) = 0.5484(7) and the HMCA result is 0.5487(6). 

We note, as expected for an exact algorithm like HMCA, that within the 
errors the value of the plaquette does not depend on dt. 

The errors quoted on the plaquette values have been obtained using the 
jacknife method: decreasing the number of the bins we analyze the behaviour 
of the error and we take the plateau value. In figure 1 we report the (3 = 6.0 
plaquette as a function of the number of trajectories. We observe the presence 
of correlations on different scales. This is a well note problem of the HMCA 
and looking at this figure we can conclude that our errors may be underesti- 
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dt 


Nmd 


Ntraj 


Accep. 


Plaq. 


< e-''^ > 


5.7 


0.075 


6 


16000 


16.5% 


0.552(2) 


1.2(3) 


5.7 


0.050 


10 


2000 


56.7% 


0.5491(6) 


1.07(6) 


5.7 


0.025 


20 


4000 


89.5% 


0.5487(6) 


1.002(3) 






6.0 


0.075 


6 


8000 


13.0% 


0.5950(3) 


0.7(1) 


6.0 


0.050 


10 


2000 


54.4% 


0.5947(2) 


0.99(3) 


6.0 


0.025 


20 


2000 


87.4% 


0.5951(2) 


1.000(6) 



Table 1: Hybrid Monte Carlo pure gauge simulations. All the simulations 
have been performed on a 6^ lattice. 



p 


dt 


Nmd 


Ntraj 


Plaq. 


5.7 


0.100 


5 


1000 


0.5315(15) 


5.7 


0.075 


6 


1000 


0.5383(13) 


5.7 


0.050 


10 


1000 


0.5438(11) 


5.7 


0.025 


20 


1000 


0.5500(10) 


6.0 


0.100 


5 


1000 


0.5803(5) 


6.0 


0.075 


6 


1000 


0.5865(5) 


6.0 


0.050 


10 


1000 


0.5917(4) 


6.0 


0.025 


20 


1000 


0.5945(5) 



Table 2: Hybrid Molecular Dynamics pure gauge simulations. All the 
simulations have been performed on a 6^ lattice. 



mated. For this reason usually people prefer to use the Metropolis algorithm 
rather than HMCA to produce quenched uncorrelated configurations. 

Looking at the data of Table 1 we note a sharp variation of the acceptance 
as a function of dt. An increase in the acceptance needs a decrement of the 
value of dt. In order to maintain a given acceptance the step size dt must 
be reduced when going to larger lattices, as we have found in some test 
simulations performed on 18^ x 32 lattices® 0. 



As we underline in section 3, we checked, following ref. |T6[, that our 



HMCA simulations satisfy the identity (3.15). In this way we are sure that 
the thermalization has been done correctly and all the properties of the 
Hamiltonian equations are conserved in the implementation. In the pure 
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gauge simulations we obtain the good results reported in the seventh column 
of Table 1. 

We also perform simulations using the HMDA, on a 6^ lattice at /? = 5.7 
and (3 = 6.0. The results are summarized in Table 2. From the data of Table 
2 we note, as expected, that the value of the plaquette obtained with a HMD 
algorithm depends on dt. In fact the plaquette approaches from lower values 
the quenched result obtained using an HMCA, as noted in ref. pO|. 



5 Unquenched Results 

In this section we present our tests and first results obtained using the 
APEIOO machine for full QCD simulation. 

All the simulations have been performed at /3 = 5.3 and kgea = 0.158 on 
8*^ and on 12^ x 32 lattices. On 8^ we use different values of dt and Nmd- 
In all the simulations we use periodic boundary conditions on the fermionic 
degrees of freedom, while for dt = 0.02, Nmd = 25 we use both periodic and 
antiperiodic spatial boundary conditions on the fermions. 

Our tests consist in the calculation of the average value of the three 
different contributions at the Hamiltonian. We check that the link vari- 
ables initialized according to eq.(3.2) at the begin of each trajectory, are 
distributed according to eq.(3.2) also at the end of each trajectories and 
that < Z^i,^ >= 4.0. We check that the pseudofermionic 

contribution to the action has an average equals to 12.0 and that the pla- 
quette values are statistically comparable with the ones presented in the 
literature0[S. 

To invert the pseudofermionic operator M^M we use a Conjugate Gra- 
dient algorithm (see section 6.). We decrease the value of R (see eq.(3.19)) 
to obtain a variation in STi. of the order of one percent. In the simulation 
reported in this paper we obtain this result for R < 10~^^. For example 
for a given 12^ x 32 configuration (of the set used for the results in Table 
4), varying the value of R in the stop condition of the CG algorithm we have 
found the 6H results reported in Table 5. We note that 1% accuracy in STi 
requires R < 10~^^. 

In these initial test we start our run from random configuration. In- 
deed starting from a configuration with a critical value of the hopping param- 
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dt 


Nmd 


Ac. 


<sn> 


< e-^'^ > 


< 0f(MtM)-V > 


< itr(P^) > 


Plaq. 


0.04 


13 


54% 


0.81(5) 


1.03(6) 


12.005(2) 


3.995(1) 


0.489(1) 


0.02 


25 


88% 


0.06(1) 


1.00(2) 


12.001(2) 


3.999(1) 


0.4894(8) 


0.01 


50 


98% 


0.003(4) 


1.001(4) 


12.002(2) 


3.9991(7) 


0.489(1) 


0.02 


25 


89% 


0.05(1) 


1.01(1) 


12.002(2) 


3.9998(5) 


0.490(1) 



Table 3: Hybrid Monte Carlo full QCD simulations on 8^ lattice, (3 = 5.3, 
ksea = 0.158. The data in the first three rows are obtained using peri- 
odic boundary conditions on the fermionic degrees of freedom, while the last 
row contains data obtained using spatial antiperiodic and temporal periodic 
boundary conditions on the fermions. The number of trajectories for these 
simulations are 890, 678, 640, 967 from the top to the bottom of the Table 
respectively. 



eter /c™** we want to reach the final thermalized configuration with kc = kc{(3). 
If in the thermalization we don't want to pass through a configuration with 
kc equals to the value of kgea of our unquenched simulation we have to start 
from a configuration warmer than the one we want to reach after the ther- 
malization. This means that we can start our full QCD run also from a 
configuration just a little warmer than the final one, for example a pure 
gauge configuration with these properties. Indeed following the studies of 
ref . []18[ , for a given unquenched configuration we can estimate the equivalent 
gauge coupling for the quenched theory which has the same lattice scale. 

The second and the fourth rows of Table 3 contain the data obtained 
using periodic and antiperiodic spatial boundary condition respectively. In 
this case the results are statistically comparable but if we look at figure 2. 
the behaviour of the plaquette versus the number of trajectories seems to be 
different. 

First results for the mass spectrum on 12'^ x 32 lattice are obtained at 
(3 = 5.3. The value of the HMCA parameter are those reported in Table 
4. In this case we have 1180 preliminary trajectories. APEIOO is currently 
running £11 at higher value of ksea- 

In figure 3. we report the behaviour of the plaquette, of 0^(M^M)~^0, 
of —6H and of l^vTT J2x,iit^{P^{^)) for the 12^ x 32 simulation as functions 
of the trajectory number. As can be seen from table 4 the results for the 
testing variables are in good agreement with theoretical predictions and with 
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dt 


Nmd 


Ac. 


<sn> 


< e-''^ > 


< 0t(MtM)-V > 


< itr(P^) > 


Plaq. 


0.01 


60 


92% 


0.04(1) 


1.01(1) 


12.0003(4) 


4.0001(2) 


0.4904(2) 



Table 4: Hybrid Monte Carlo full QCD simulations on 12^ x 32 lattice, 
f3 = 5.3, kgea = 0.158, with periodic boundary conditions on the fermionic 
degrees of freedom. 



R 


5n 


l.Oe-06 


-0.8375 


l.Oe-07 


-0.6997 


l.Oe-08 


0.1123 


l.Oe-09 


0.1842 


l.Oe-10 


0.1878 


l.Oe-12 


0.1712 


5.0e-13 


0.1747 


l.Oe-13 


0.1763 


7.0e-14 


0.1767 



Table 5: The value of STi. as a function of the precision R (see equation 
3.19) of the inversion for a fixed configuration in the set described in Table 
4. 

the plaquette results reported in literature0 . In figure 4 we report the 
average of the correlation functions for the pseudoscalar and for the vector 
mesons at k^aience = ksea = 0.158. In figure 5 we report the behaviour of the 
effective mass for these particles. 

In figure 6 we report the time history of pseudoscalar and vectorial cor- 
relation at euclidean time separation 11. From this figure we observe the 
presence of autocorrelation on different scales^S. 

We start the run with 30 HMD trajectories, then we perform 180 ther- 
malization HMC trajectories. At this point we invert the fermionic operator 
to obtain a propagator for a given k^aience every 5 trajectories. 

In this paper we report a preliminary analysis of the pseudoscalar and 
vectorial correlation functions (see eq (2.11-2.12)) obtained on this ensemble 
of unquenched configurations (see figure 4, 5, 6). 
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6 Technical Aspects. 

In the implementation of the algorithm on the APEIOO machine, some care 
must be devoted to possible problems due to the single-precision of the float- 
ing point units. With the lattices used in the actual full QCD simulations, 
this type of problems are not so important in iterative algorithms with sta- 
ble solution, where, if convergence occurs, the fixed point solution is reached 
anyway, eventually with a larger number of iterations. The difference be- 
tween single and double precision has to be paied in CPU time. However 
some improvements can be obtained taking care in the implementation of 
the inversion algorithm. 
To solve the equation 

M^Mx = 
defining the residue at the n iterative step 

= M^Mxn - 
the Conjugate Gradient (CG) iterative equations are: 

Xo = , Po = Ro 

i^Rnj Rn) 

"""" {MFn,MFn) 

Pn 

Rn+l =Rn- an{M^M) P„ 

^ _ (-Rn+l, -Rn+l) 
(yRni Rn) 

Pn+l = -Rn+l + 

where Xn is the solution after n steps. 

To compute the CG equations (6.4) and (6.7), we need to perform a 
global sum. On APEIOO we use a tree algorithm. While a standard algo- 
rithm adds the items sequentially in a variable, our tree algorithm consists 
in summing the items in pair and in iterating this operation until the total 
sum is reached. Thus the tree algorithm is designed to sum only numbers of 
the same magnitude reducing the rounding errors. 
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(6.1) 
(6.2) 

(6.3) 

(6.4) 

(6.5) 
(6.6) 

(6.7) 

(6.8) 



We are also checking on APEIOO machines a new algorithm proposed by 
Valerio ParisilI3 to correct for rounding errors in summation. This iterative 
algorithm uses a variable to dynamically backing up the informations lost in 
the sum. 



More details of these algorithms will be presented in ref.|TT 



While in the inversion algorithm the rounding errors can reduce the con- 
vergence velocity, in the calculations of 571 they can cause systematic errors, 
because Sli. is used in the accept/reject condition. 

Another technical aspect that we faced is the random number generation. 
In the HMCA simulation we need to initialize the and the variables at 
the start of each new trajectory. With large lattice for a single thermalized 
configuration we have to generate (9(10^°) random numbers. Although on 
the APEIOO machines a random number generator is available and used in 
the quenched simulations since several years, we have been lead to create a 
new generator of real random number with different period, correlation and 
speed properties, in order to check the possible effects on the physical results. 

The problem with APEIOO is that the machine only works with 32 bits 
real numbers and allows a limited amount of mixed operations between real 
and integer numbers, while most of the tested random generators use integers 
in the algorithm and then convert the result into real numbers. We have 



designed^ a new generator'!!^ combining three different algorithms, taking 
care that the rounding error effects do not ruin the statistical and long period 
properties. The mixing of the three generators can produce numbers that 
are more independent than those produced by each algorithm. In order to 
test the implementation of the evolution equations (3.9, 3.10) and the good 
properties of the new random number generator we have checked that, during 
the whole evolution, the variables follow a gaussian distribution. 



7 Full QCD on APEIOO Machines. 

As we said in the Introduction, the APEIOO machines are parallel supercom- 
puters with modular building blocks. At present we are using the "tube" 
configuration, which is composed by 128 processors (nodes, arranged on a 
2 X 2 X 32 three-dimensional structure) and has a peak speed of 6.4 Gflop. 

^We thank G. Parisi for all the long and useful discussion we had with him about this 
subject. 
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In the next future the "tower" configuration will be available, which is com- 
posed of 512 nodes, arranged in a 8 x 8 x 8 configuration with a peak speed of 
25.6 Gfiop. Each node has a memory capacity of 4 Mbytes ( 10^ real 32-bit 
words). With this amount of memory the maximum lattice size we can use 
in the case of a full QCD simulation on the "tube" is 20^ x 32 while on the 
"tower" is 32^. 

The code as it is implemented can achieve a performance of 60 % of 
the peak speed on the most time consuming part (the inversion of the pseud- 
ofermionic M'^M operator and the staple calculations). We are implementing 
a more efficient code using the features of the next APEIOO compiler. 

In the 12^ x 32 simulations reported in Section 5. we obtain an high 
acceptance (92%). From the studies on the 8"^ lattice we see that a good 
value of the acceptance can be reach going from dt = 0.01, A^md = 60 to 
dt = 0.03, A^MD = 20. In this way we gain a factor three in the CPU 
time and we expect to obtain with a "tube" something like 1000 full QCD 
trajectories in 8 days on a 12^ x 32 lattice with a quark mass in the strange 
quark region. Taking into account the autocorrelation modes we can generate 
5 -^ 10 independent configurations in 8 days. 

Looking at all the results of full QCD spectroscopyi , we note that the 
more realistic simulations with two fiavour Wilson fermions have been per- 
formed on 16^ X 32 lattices with low statistics (see for example Table 1 of 
ref. 1^). Looking for example at the results of ref.0, we note the large 
correlations that affect the time history of the pion propagator of fig. 2. This 
is the same situation of fig. 2. a of ref.0. 

This means that, more long runs have to be performed to deeply under- 
standing the long scale correlations before obtaining realistic results. This is 
the first goal that we want to reach. 

With a "tube" we are running on a 12'^ x 32 lattice at /5 = 5.3 and we are 
performing measures on the hadronic and glueballs mass spectrum. With 
a "tower" or with the full 100 Gfiop, 2048 nodes machine, we can simulate 
full QCD on large lattices, comparable with those used today for staggered 
fermions. Large lattices are needed to control the errors coming from finite 
size effects. The study of this systematic error in the full QCD simulations 
will be another point to be understood. 

With these powerful machines will be possible to repeat the calculations 
with dynamical see quarks of all the physical quantities (mass spectrum, 
decay constant, matrix elements, etc.) obtaining eventually the correct full 
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QCD results. 
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FIGURE CAPTIONS 



Figure 1 The plaquette in a quenched (HMC) simulation as a function of the 
number of trajectories on a 6^ lattice for the simulation ai f3 — 6.0, 
dt = 0.025, Nmd = 20. 

Figure 2. a The plaquette in a full (HMC) simulation with periodic boundary con- 
ditions as a function of the number of trajectories on a 8^ lattice for 
the simulation at /3 = 5.3, dt = 0.02, Nmd = 25 kgea = 0.158. 

Figure 2.b Same as fig. 2a for anti periodic spatial boundary conditions as a 
function of the number of trajectories on a 8^ lattice for the simulation 
at /? = 5.3, dt = 0.02, Nmd = 25 ksea = 0.158. 

Figure 3 The plaquette (a), the (j)^{MW)-^(j) (b), the -SH (c), and the 

— I]:r,mu {^) ^ (HMC) simulatiou with periodic 

boundary condition as a function of the number of trajectories on a 
12^ X 32 lattice for the simulation at /9 = 5.3, dt = 0.01, Nmd = 60 
k,ea = 0.158. 

Figure 4 The G^^it) correlation function of eq.(2.11) (a), and the Gkkii) correla- 
tion function of eq.(2.12) (b) in a full (HMC) simulation with periodic 
boundary conditions on a 12^ x 32 lattice for the simulation at /3 = 5.3, 
dt = 0.01, Nmd = 60 ksea = 0.158. 

Figure 5 The pseudoscalar effective mass (a), and vectorial effective mass (b) 
in a full (HMC) simulation with periodic boundary conditions on a 
12=^ X 32 lattice for the simulation at /3 = 5.3, dt = 0.01, Nmd = 60 
ksea = 0.158. 



Figure 6 The G^z{t) correlation function (a) and the Gkk{t) correlation function 
(b) as a function of trajectories number for euclidean time separation 
11 in a full (HMC) simulation with periodic boundary conditions on a 
12^ X 32 lattice for the simulation at /5 = 5.3, dt = 0.01, Nmd = 60 

ksea = 0.158. 
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